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Abstract — We consider the problem of communicating over a 
channel for which no mathematical model is specified. We present 
achievable rates as a function of the channel input and output 
sequences known a-posteriori for discrete and continuous chan- 
nels. Furthermore we present a rate-adaptive scheme employing 
feedback which achieves these rates asymptotically without prior 
knowledge of the channel behavior. 

I. Introduction 

The problem of communicating over a channel with an 
individual, predetermined noise sequence which is not known 
to the sender and receiver was addressed by Shayevitz and 
Feder lOQQ and Eswaran et. al. JUE). The simple example 
discussed in (TJ is of a binary channel y n — x n ffi e„ where 
the error sequence e n can be any unknown sequence. Using 
perfect feedback and common randomness, communication is 
shown to be possible in a rate approaching the capacity of the 
binary symmetric channel (BSC) whose the error probability 
equals the empirical error probability of the sequence (the 
relative number of T-s in e n ). Subsequently both authors 
extended this model to general discrete channels and modulu- 
additive channels (151. El resp.) with an individual state se- 
quence, and showed that the empirical mutual information can 
be attained. 

In this work we take this model one step further. We con- 
sider a channel where no specific probabilistic or mathematical 
relation between the input and the output is assumed. We 
term this channel an individual channel and we would like to 
characterize the achievable rate using only the input and output 
sequences. The decoder may have a feedback link in which 
the channel output or other information from the decoder can 
be sent back. Without this feedback it would not be possible to 
match the rate of transmission to the quality of the channel so 
outage would be inevitable. This model has various advantages 
and disadvantages compared to the classical one, however 
there is no question about the reality of the model: this is the 
only channel model that we know for sure exists in nature. 
This point of view is similar to the approach used in universal 
source coding of individual sequences where the goal is to 
asymptotically attain for each sequence the same coding rate 
achieved by the best encoder from a model class, tuned to the 
sequence. 

Just to inspire thought, let us ask the following question: 
suppose the sequence {xi}™ =1 with power P = — Y^i=i x i 
encodes a message and is transmitted over a continuous real- 
valued input channel. The output sequence is {j/i}" =1 . One 
can think of t>i = — Xi as a noise sequence and measure its 
power N = A £"=i vf . The rate R = \ log (l + ^) is the 



capacity of a Gaussian additive channel with the same noise 
variance. Is the rate R also achievable in the individual case, 
under appropriate definitions ? 

The way it was posed, the answer to this question would 
be "no", since this model predicts rate of \ bit/use for the 
channel whose output Vi : y L — which cannot convey any 
information. However with the slight restatement done in the 
next section (see Eq.([2]i below) the answer would be "yes". 

We consider two classes of individual channels: discrete 
input and output channels and continuous real valued input and 
output channels. In both cases we assume that feedback and 
common randomness exist (perfect feedback is not required). 
In Q we address also the case where feedback does not exist, 
which yields interesting results, but to keep the presentation 
concise we focus here on the more important case of feedback 
communication. The main result is that with small amount of 
feedback, a communication at a rate close to the empirical 
mutual information (or its Gaussian equivalent for continuous 
channels) can be achieved, without any prior knowledge, or 
assumptions, about the channel structure. Here we present the 
main result and the communication scheme obtaining it and 
give an outline of the proof. The full proof is omitted and 
appears in [5]. We also give several examples and highlight 
areas for further study. 

II. Overview of the main results obtained so far 

We start with a high level overview of the definitions 
and results. The discussion below is conceptual rather than 
accurate, while the detailed definitions follow in the next 
section. 

We say a given rate function R cmp : X n x y n — > K 
is achieved by a communication scheme with feedback if 
for large block size n, data at rate close to or exceeding 
ficmp(x,y) is decoded successfully with arbitrarily large 
probability for every output sequence and almost every input 
sequence. Roughly speaking, this means that in any instance 
of the system operation, where a specific x was the input and a 
specific y was the output, the communication rate had been at 
least i? emp (x, y). Note that the only statistical assumptions are 
related to the common randomness, and we consider the rate 
(message size) and error probability conditioned on a specific 
input and output, where the error probability is averaged over 
common randomness. 

The definition of achievability is not complete without 
stating the input distribution, since it affects the empirical 
rate. For example, by setting x = one can attain every rate 
function where -R G mp(0,y) = in a void way, since other x 



sequences will never appear. In contrast with classical results 
of information theory, we do not use the input distribution only 
as a means to show the existence of good codes: taking ad- 
vantage of the common randomness we require the encoder to 
emit input symbols that are random and distributed according 
to a defined prior (currently we assume i.i.d. distribution). 

In this paper we focus on rate functions that depend on 
the instantaneous (zero order) empirical statistics. Extension to 
higher order models seems technical. For the discrete channel 
we show that a rate 

Remp = I (x; y) (1) 

is achievable with any input distribution Q(x) where /(■; ■) de- 
notes the empirical mutual information (5) . For the continuous 
(real valued) channel we show that a rate 

^ = 2- log ( wk^) (2) 

is achievable with Gaussian input distribution W(0, P), where 

T 

p = ii^i ii^n is the empirical correlation factor between the 
input and output sequences (at this stage for simplicity p is 
defined in a slightly non standard way without subtracting the 
mean). Although the result regarding the continuous case is 
less tight, we show in |5| that this is the best rate function 
that can be defined by second order moments, and it is tight 
for the Gaussian additive channel (for this channel p 2 = p ^ N 
therefore i? cmp = |log(l + ^)). The same rates apply 
also to the case of communication without feedback where 
achievability is defined by the ability to decode a fixed rate R 
whenever R cmp > R. 

We may now rephrase our example question from the 
introduction so that it will have an affirmative answer: given 
the input and output sequences, describe the output by the 
virtual additive channel with a gain y.- L — axi + Vi, so the 
effective noise sequence is Vi — yi — axi. Chose a so that 
v _L x, i.e. - v i x i = 0. An equivalent condition is that a 
minimizes llvll 2 . The resulting a is the LMMSE coefficient in 

T 

estimation of y from x (assuming zero mean), i.e. a = j^p- 
Define the effective noise power as N — — Y^i=i v i> an d tne 
effective SNR = g jf-. It is easy to check that SNR = jA=2- 
Then according to Eq.^ the rate R = | log (1 + SNR) 
is achievable, in the sense defined above. Reexamining the 
counter example we gave above, in this model if we set y = 
we obtain p = and therefore i? C mp = 0, or equivalently the 
effective channel has v = and a = 0, therefore SNR = 
(instead of v = —x, a = 1 and SNR = 1). 

As will be seen, we achieve these rates by random coding 
and universal decoders, and use iterated instances of rateless 
coding. The scheme is able to operate asymptotically with 
"zero rate" feedback (meaning any positive capacity of the 
feedback channel suffices). A similar although more compli- 
cated scheme was used in J3). The main differences are the 
use of training to evaluate the stopping condition as well as a 
different code construction and are summarized in 0. 



The classical point of view first assumes a channel model 
and then devises a communication system optimized for it. 
Here we take the inverse direction: we devise a communication 
system without assumptions on the channel which guarantees 
rates depending on channel behavior. The channel model we 
assume is more stringent than the probabilistic and semi- 
probabilistic models since we make less assumptions about the 
channel, and the error probability and rate are required to be 
met for (almost) every input and output sequence (rather than 
on average). This change of viewpoint does not make prob- 
abilistic or semi probabilistic channel models redundant but 
merely suggests an alternative. By using a channel model we 
can formalize questions relating to optimality such as capacity 
(single user, networks) and error exponent as well as guarantee 
a communication rate a-priori. Another aspect is that we pay 
a price for universality. Even if one considers an individual 
channel scheme that guarantees asymptotically optimum rates 
over a large class of channels, it can never consider all possible 
channels (block-wise), and for a finite block size it will have 
larger overhead (a reduction in the amount of information 
communicated with same error probability) compared to a 
scheme optimized for the specific channel. 

Several concepts used in this work such as common ran- 
domness and rateless coding, are borrowed from prior work 
on arbitrarily varying channels (AVC, see for example [7||8|) 
compound channels with feedback [9|[10| and individual noise 
sequence channels with feedback [2] [3 ]. It is worth noting ifTTIl 
where a somewhat similar concept was used in defining an 
achievable communication rate by properties of the channel 
input and output. An important observation is that a strict 
definition of capacity exists only for fixed rate systems (where 
the capacity is the supremum of achievable rates) while in rate 
adaptive communication there is some freedom in determining 
the rate function. 

Following our results, the individual channel approach be- 
comes a very natural starting point for determining achievable 
rates for various probabilistic and semi-probabilistic models 
(AVC, individual noise sequences, probabilistic models, com- 
pound channels) under the realm of randomized encoders, 
since the achievable rates for these models follow easily from 
the achievable rates for specific sequences, and the law of large 
numbers. We will give some examples later on. 

III. Definition of variable rate communication 

SYSTEM WITH FEEDBACK 

A randomized block encoder and decoder pair for the 
channel X — > y (defined by the two alphabets X,y) with 
block length n adaptive rate and feedback communicates a 
message expressed by the infinite sequence wj° 6 {0,1}°°. 
The system is defined using a random variable S distributed 
over the set S (the common randomness) and a feedback 
alphabet T. The encoder is defined by a series of map- 
pings Xk = </>fc(w, s, f and the decoder is defined by 
a feedback function = (fk(y , s), a decoding function 
w = 0(y, s) and a rate function R = r(y,s). The error 
probability for message wf is defined as P e (x,y) = 
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j =/= w[ n ^ |x, yj , i.e. recovery of the first \nR~\ 
bits by the decoder is considered a successful reception. This 
system is illustrated in figure [T] 

IV. Statement of the main result 
We consider two cases: 

1) discrete: The input and output alphabets X, y are dis- 
crete and finite, and the prior Q(x) can be arbitrarily 
chosen 

2) continuous: The input and output alphabets are real val- 
ued X = y = K and the prior is Gaussian Q = Af(0, P) 

The scheme proposed below satisfies the following theorem 
with respect to these two cases: 

Theorem 1 (Theorems 3,4 of [5 1). For every P e , P Al S, R > 
there is n large enough and random encoder and decoder with 
feedback and variable rate over block size n with a subset 
J C X", such that: 

• The distribution of the input sequence is x ~ Q n 
independently of the feedback and message 

• The probability of error is smaller than P e for any x, y 

• For any input sequence x ^ J and output sequence y S 
y n the rate is R > min [i? Gmp (x, y) — S,R\, where 

{7(x, y) discrete 
i n / i \ .. (3) 
I lo & I i-p 2 (x,y) ) continuous 

• The probability of the subset J is bounded by Pr(x € 
J)<Pa 

The limit R, which can be arbitrarily large, reflects the fact 
the communication rate is finite, even when i? cmp = oo (p 2 
= 1 in the continuous case). In the discrete case R can be 
omitted (by selecting R = log min (| A" |, \y\) > J(x,y)). 

Regarding the subset J as we shall see in the proof outline 
there are some sequences for which poor rate is obtained, 
and since we committed to an input distribution we cannot 
avoid them. However there is an important distinction between 
claiming for example that "for each y the probability of 
R < i? om p is at most Pa" and the claim made in the theorem 
that "R < R cmp only when x belongs to a subset J with 
probability at most Pa". The first claim is weaker since 
choosing y as a function of x may potentially increase the 
probability of R < R cmp beyond Pa, by attempting to select 
for every x a sequence y for which x is a bad input sequence. 
This weakness is avoided in the second claim. A consequence 
of this definition is that the probability of R < i? cmp is 
bounded by Pa for any conditional probability Pr(y|x) over 
the sequences. The probability Pa can be absorbed into P e 



with the implication that the error probability becomes limited 
to the set J (see 0). 

V. The proposed rate adaptive scheme 

The following communication scheme sends B indices from 
{1, . . . , M} over n channel uses (or equivalently sends the 
number 9 G [0, 1) in resolution M~ B ), where M is fixed, and 
B varies according to empirical channel behavior. The building 
block is a rateless transmission of one of M codewords 
(K = log(Af ) information units), which is iterated until the 
n-th symbol is reached. The codebook Cm yen consists of M 
codewords of length n, where all M x n symbols are drawn 
i.i.d. ~ Q and known to the sender and receiver. 

In each rateless block b = 1,2, . . ., a new index i = % £ 
{1, . . . , M} is sent to the receiver, k denotes the absolute time 
index 1 < k < n. Block b starts from index fc&, where k\ = 1, 
and b is incremented following the decoder's decision to 
terminate a block. After symbol n is reached the transmission 
stops and the number of blocks sent is B = 6—1. The 
transmission of each block b follows the procedure described 
below: 

1) The encoder sends index i = % by sending the symbols 
of codeword i : Xk = C^k, and incrementing k until 
the decoder announces the end of the block. Note 
that different blocks use different symbols from the 
codebook. 

2) The decoder announces the end of the block after symbol 
m in the block (m = k — kb + 1) if for any codeword 

flemp((Xi)kyjbJ>A4 (4) 

where i? cmp (x,y) defined in Eq.(|3]l is used as the 
decoding metric and \x* m is a threshold defined in Eq.([5]) 
below. 

3) When the end of block is announced one of the i ful- 
filling Eq.Q is determined as the index of the decoded 
codeword i\, (breaking ties arbitrarily). 

4) If symbol n is reached without fulfilling Eq.Q, then the 
last block is terminated without decoding. 

The threshold ji^ is defined as: 
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if+log(-^) + |^||y|log(m+l) 
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continuous 



(5) 



The scheme achieves the claims of Theorem Q] when K 
is chosen to increase as (9(log(?i)) < K < 0(n). The 
scheme uses one bit of feedback per channel use, however the 
same asymptotical rates are obtained if (a possibly delayed)) 
feedback is sent only once every T symbols (for any T > 0), 
therefore we can claim the theorem holds with "zero rate" 
feedback. 

VI. Outline of the proof of the main result 

The error analysis is based on two lemmas (Lemma 1 and 
Lemma 4 of Q) which state that the probability of the metric 
i?emp used in section [V] to exceed a given threshold t when 



the 77i-length x is drawn i.i.d. independently of y (and from a 
Gaussian distribution in the continuous case), is approximately 
exp(— mt), or more accurately: 



-R = -1/2 log(l-p ) 

emp ox ' ' 

■ Rj^gj achived by scheme 



Q m (i? cmp (x,y)>i)< 
where 5 rn = \X\\y\ 



exp (— m(t — S m )) discrete 
2 exp (—(m — l)t) continuous 

(6) 

,n+1 - > . This bound determines the 
pairwise error probability. Using this bound with t — p* n and 
the union bound (over M — 1 competing codewords and over 
n decoding attempts), we show that the error probability is 
bounded below P e . 

The analysis of the rate is more intricate. Basically it relies 
on the fact that when a block is decoded, the metric R cmp 
exceeds the threshold p* s» — at the last symbol, but it lies 
below the threshold at the previous symbol, therefore roughly 
speaking, at the end of the block i? cmp « /i* w — = 
where is the instantaneous transmission rate over the block. 
Therefore the empirical rate is attained per rateless block and 
a convexity argument is used in order to show that the total 
rate (average of i? C mp over blocks) is at least the empirical 
rate i? cmp (x,y) measured over the complete sequences. 

There are several difficulties, however. Considering for ex- 
ample the discrete case, since the rate achieved instantaneously 
over a rateless block is approximately the empirical mutual 
information over the block, we would like to claim that the 
averaged rate over rateless blocks is greater or equal to the 
empirical mutual information over the entire transmission, 
which implies convexity of the empirical mutual information. 
However the mutual information is concave with respect to the 
input distribution. Here, the input distribution is the empirical 
distribution over rateless blocks, whose limits are determined 
during transmission by the decoding rule and depend on the 
channel output y. Another difficulty is that the last symbol of 
each block is not fully utilized: the empirical mutual informa- 
tion crosses the threshold at the last symbol. But whether it 
crosses it just barely, or crosses it by a significant extent, the 
rates our scheme attains remain the same. However a large 
increase in empirical mutual information at the last symbol 
increases the target rate thus increasing the gap between the 
target rate and the rate attained. Here, a "good" channel is 
bad for our purpose. Since we operate under an arbitrary 
channel regime, this increase is not bounded by the average 
information contents of a single symbol. This is especially 
evident in the continuous case where i? e mp is unbounded. 
A similar difficulty arises when bounding the loss from the 
potentially unfinished last block in the transmission: since y 
is arbitrary it can be determined so that this block has the best 
mutual information. 

We resolve the aforementioned difficulties by proving a 
property we term "likely convexity" (Lemmas 5,6 in J5)): 
given a partitioning of the symbols 1, . . . , n into subsets, we 
show that if the number of subsets does not grow too fast, and 
independently of their size, there is a group of x sequences 
J with vanishing probability, such that if x ^ J, the mutual 
information (in the discrete case) and the squared correlation 




Fig. 2. Illustration of i? e mp lower bound of theorem^ for the continuous 
case (i?LB2) ar, d the lower bound -Rlbi shown in the proof in |5], as a 
function of p, for n = 10 8 , K = 10 6 , Pa = 0.001, P e = 0.001 



factor (in the continuous case) are convex up to an arbitrarily 
small offset A, i.e. the convex combination of mutual informa- 
tion (resp. p 2 ) over the subsets, weighted by their size, exceeds 
the mutual information (resp. p 2 ) measured over the entire n 
symbols minus A. The likely convexity is used to bound the 
loss from unused symbols (by bounding their number, and 
the mutual information or correlation factor, resp.), as well as 
show that the mean rate over rateless blocks meets or exceeds 
the overall empirical rate (mutual information or its Gaussian 
counterpart). Convexity of i? cm p = — §l°g(l — P 2 ) follows 
from convexity of p 2 by Jensen's inequality. 

The likely convexity property results in the existence of the 
subset J of bad sequences. An example for such a sequence is 
the sequence of \n zeros followed by \n ones (for the binary 
channel), in which at most one block will be sent, and thus the 
asymptotic rate tends to 0, although the empirical distribution 
is Ber(^) (-ff(x) = 1) and the empirical mutual information 
may be I = 1. 

Finally, in order to make sure the error probability, the 
probability of J, and the various rate offsets inserted by the 
communication system and by the proof technique all tend to 
zero as in oo, the information contents of each block is 
required to increase at a rate 0(log(n)) < K < 0(n). As 
part of the proof in [5| we introduce several lemmas which 
seem to constitute fundamental and useful tools in analyzing 
individual sequences. Figure Q illustrates a lower bound for 
the rate achieved by the proposed scheme for finite n (termed 
-Rlbi) which is calculated in [5], as well as a bound (i?LB2) 
satisfying the form defined in Theorem [T] 

VII. Examples 

In this section we give some examples to illustrate the model 
developed in this paper. Further details appear in 0. 

A. Non linear channels 

The expression | log ( 1 j 1 ^ j determines a rate which is 
always achievable using a Gaussian prior, and is useful for an- 



alyzing non linear channels. As an example, transmitter noise 
generated by power amplifier distortions is usually modeled as 
an additive noise, although it is correlated with the transmitted 
signal, resulting in an overly optimistic model. Using the 
procedure described in the overview of finding the coefficient 
a such that this noise is orthogonal to the transmitted signal 
we can model the non linearity as an effective gain plus an 
additive noise. The rates computed using this model are always 
achievable, and thus are a practical alternative to calculating 
the channel capacity, and enable simplified modeling of the 
distortion as an additive noise. 

B. Channels that fail the zero order and the correlation model 

The fact we used the zero-order empirical distribution makes 
the scheme less effective for channels with memory. For 
example for the error free channel y^ = Xk-i the achieved 
rate would be (with high probability). Similarly for the 
correlation model if yf. = x\ then p = 0. The remedy should 
be sought in employing higher order empirical distributions 
and in the continuous case in using tighter approximations of 
the empirical statistics (e.g. by higher order statistics). 

C. Application to other channel models 

As we noted in the overview, the results obtained for 
the arbitrary channel model constitute a convenient starting 
point for analyzing channel models which have a full or 
partial probabilistic behavior. It is clear that results regarding 
achievable rates in fully probabilistic, compound, arbitrarily 
varying and individual noise sequence models can be obtained 
from applying the weak law of large numbers to Theorem [T] 
(limited, in general, to the randomized encoders regime). For 
example the result of Q~| for the binary channel y n — x n © e„ 
can be easily reconstructed by applying the scheme with 
Q = Ber(^), asymptotically approaching (or exceeding) the 
rate: 

tfcmp = i(x; y) = H(y) - ff(y |x) = H(y) - ff(e|x) > 
> H(y) - H(e) = H(y) - /i b (e)^l bit - h b (e) (7) 

Since is i.i.d. Ber(^) so is Y^, and the limit follows 
from the law of large numbers and the continuity of H(-). 
In we consider the discrete channel with state sequence 
presented by where the sequence is potentially determined 
by an adversary knowing the past channel inputs and outputs 
(as opposed to a fixed sequence assumed in |3|), and show 
by similar arguments that the same communication rates can 
be attained. This result is a superset of the results of 1 3 1 and 
0, and is new, to our knowledge. Applying Theorem [T] the 
proof is simple: it only remains to show through a probabilistic 
calculation that the difference between the empirical mutual 
information and the target rate (the state averaged mutual 
information defined in 0) converges to in probability. 

A more anecdotic particular case is the additive Gaussian 
channel where by Theorem 2 and Lemma 4 of [5 | we obtained 
a very simple proof for the achievability part of this channel's 
capacity, using simple and geometrical considerations without 



the heavy machinery of AEPs or error exponents, and by em- 
ploying a maximum correlation factor decoder rather than the 
maximum likelihood (minimum Euclidian distance) decoder. 

VIII. Further study 

This work lays the foundations and introduces the new con- 
cept of individual channels together with basic achievability 
results. Following that, there are many open questions. To 
name the most outstanding ones: 

« Extensions of the model to include time dependency 

• Definition of the empirical mutual information for contin- 
uous alphabets, and extension of the scheme to approach 
this empirical mutual information. Unification of the 
discrete and continuous cases, and extension to multiple 
input/output channels. 

• Analysis of the overheads and their dependence on the 
model complexity (asymptotical rate - overhead tradeoff) 

• Best asymptotical error rates 

• Determining and adjusting the channel input to channel 
behavior (e.g. by adjusting the prior), and considering 
alternatives to the strict constraint imposed here on the 
input prior 

• Outer bounds on achievable rates 

• The minimal amount of randomization required to attain 
the empirical mutual information 

In we give additional details and make some initial 
comments about these directions. 
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